AITopics | transformer-based model

Collaborating Authors

transformer-based model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a408234a9b80604a9cf6ca518e474550-Paper-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 22:15:42 GMT

brain network, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

Neural Information Processing SystemsMar-22-2026, 07:17:31 GMT

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1.84%, 0.67%, and 0.60% in performance on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

DeformableTST: Transformer for Time Series Forecasting without Over-reliance on Patching

Neural Information Processing SystemsMar-21-2026, 20:14:56 GMT

With the proposal of patching technique in time series forecasting, Transformerbased models have achieved compelling performance and gained great interest fromthe time series community. But at the same time, we observe a new problem thatthe recent Transformer-based models are overly reliant on patching to achieve idealperformance, which limits their applicability to some forecasting tasks unsuitablefor patching. In this paper, we intent to handle this emerging issue. Through divinginto the relationship between patching and full attention (the core mechanismin Transformer-based models), we further find out the reason behind this issueis that full attention relies overly on the guidance of patching to focus on theimportant time points and learn non-trivial temporal representation. Based on thisfinding, we propose DeformableTST as an effective solution to this emergingissue. Specifically, we propose deformable attention, a sparse attention mechanismthat can better focus on the important time points by itself, to get rid of the need ofpatching. And we also adopt a hierarchical structure to alleviate the efficiency issuecaused by the removal of patching.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

DeformableTST: Transformer for Time Series Forecasting without Over-reliance on Patching

Neural Information Processing SystemsFeb-17-2026, 01:42:08 GMT

In this paper, we intent to handle this emerging issue.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Alabama (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.68)
Banking & Finance (0.67)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.83)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Supplementary Material for CrossGNN: Confronting Noisy Multivariate Time Series Via Cross Interaction Refinement Anonymous Author(s) Affiliation Address email Appendix 1

Neural Information Processing SystemsFeb-15-2026, 22:15:44 GMT

Correlation mechanism to capture cross-time dependency for forecasting. Besides, the dimension of the channel is set to 16 based on efficiency considerations. Weather, and the look-back window size is set as 96. Proposition 2. The time and space complexity for the Cross-variable GNN is Frequency enhanced decomposed transformer for long-term series forecasting.

artificial intelligence, crossgnn, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Data Science (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

a408234a9b80604a9cf6ca518e474550-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 03:21:45 GMT

Inthis work, we study Transformer-based models for brain network analysis.

artificial intelligence, brain network, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Tempo: AcceleratingTransformer-BasedModel TrainingthroughMemoryFootprintReduction

Neural Information Processing SystemsFeb-8-2026, 21:59:20 GMT

Transformer-based models, which have recently seen a surge in popularity due to their good performance and applicability to a variety of tasks, have a similar problem.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pairwise Causality Guided Transformers for Event Sequences

Neural Information Processing SystemsDec-26-2025, 08:30:02 GMT

Although pairwise causal relations have been extensively studied in observational longitudinal analyses across many disciplines, incorporating knowledge of causal pairs into deep learning models for temporal event sequences remains largely unexplored. In this paper, we propose a novel approach for enhancing the performance of transformer-based models in multivariate event sequences by injecting pairwise qualitative causal knowledge such as `event Z amplifies future occurrences of event Y'. We establish a new framework for causal inference in temporal event sequences using a transformer architecture, providing a theoretical justification for our approach, and show how to obtain unbiased estimates of the proposed measure. Experimental results demonstrate that our approach outperforms several state-of-the-art models in terms of prediction accuracy by effectively leveraging knowledge about causal pairs. We also consider a unique application where we extract knowledge around sequences of societal events by generating them from a large language model, and demonstrate how a causal knowledge graph can help with event prediction in such sequences. Overall, our framework offers a practical means of improving the performance of transformer-based models in multivariate event sequences by explicitly exploiting pairwise causal information.

name change, pairwise causality guided transformer, sequence, (6 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training

Neural Information Processing SystemsDec-26-2025, 04:15:54 GMT

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.

efficient integration, memory optimization and parallelism, transformer training, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Brain Network Transformer

Neural Information Processing SystemsDec-24-2025, 22:07:08 GMT

Human brains are commonly modeled as networks of Regions of Interest (ROIs) and their connections for the understanding of brain functions and mental disorders. Recently, Transformer-based models have been studied over different types of data, including graphs, shown to bring performance gains widely. In this work, we study Transformer-based models for brain network analysis. Driven by the unique properties of data, we model brain networks as graphs with nodes of fixed size and order, which allows us to (1) use connection profiles as node features to provide natural and low-cost positional information and (2) learn pair-wise connection strengths among ROIs with efficient attention weights across individuals that are predictive towards downstream analysis tasks. Moreover, we propose an Orthonormal Clustering Readout operation based on self-supervised soft clustering and orthonormal projection. This design accounts for the underlying functional modules that determine similar behaviors among groups of ROIs, leading to distinguishable cluster-aware node embeddings and informative graph embeddings. Finally, we re-standardize the evaluation pipeline on the only one publicly available large-scale brain network dataset of ABIDE, to enable meaningful comparison of different models. Experiment results show clear improvements of our proposed Brain Network Transformer on both the public ABIDE and our restricted ABCD datasets.

brain network transformer, name change, transformer-based model, (6 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback